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Abstract 



Splicing as a binary word/language operation is inspired by the DNA recombination under 
the action of restriction enzymes and ligases, and was first introduced by Tom Head in 1987. 
Shortly thereafter, it was proven that the languages generated by (finite) splicing systems 
form a proper subclass of the class of regular languages. Ifowever, the question of whether 
or not one can decide if a given regular language is generated by a splicing system remained 
open. In this paper we give a positive answer to this question. Namely, we prove that, if 
a language is generated by a splicing system, then it is also generated by a splicing system 
whose size is a function of the size of the syntactic monoid of the input language, and which 
can be effectively constructed. 

1 Introduction 

In [10] Head described an language-theoretic operation, called splicing, which models DNA recom- 
bination, a cut-and-paste operation on DNA double-strands. Recall that a DNA single-strand is 
a polymer consisting of a series of the nucleobases Adenine (A), Cytosine (C), Guanine (G), and 
Thymine (T) attached to a linear, directed backbone. Due to the chemical structure of the back- 
bone, the ends of a single-strand are called 3'-end and 5'-end. Abstractly, a DNA single-strand can 
be viewed as a string over the four letter alphabet {A, C, G, T}. The bases A and T, respectively C 
and G, are Watson-Crick-complementary, or simply complementary, which means they can attach 
to each other via hydrogen bonds. The complement of a DNA single-strand a = 5'-ai ■ • ■ a„-3' 
is the strand a — i'-al ■ ■ ■ where oi, . . . , a„ are bases and oi, . . . denote their comple- 
mentary bases, respectively; note that a and 5 have opposite orientation. A strand a and its 
complement a can bond to each other to form a DNA ( double- ) strand. 

Splicing is meant to abstract the action of two compatible restriction enzymes and the ligase 
enzyme on two DNA double-strands. The first restriction enzyme recognizes a base-sequence uiVi, 
called its restriction site, in any DNA string, and cuts the string containing this factor between 
ui and vi. The second restriction enzyme, with restriction site U2V2, acts similarly. Assuming 
that the sticky ends obtained after these cuts are complementary, the enzyme ligase aids then 
the recombination (catenation) of the first segment of one cut string with the second segment of 
another cut string. For example, the enzyme Taq\ has restriction site TCGA, and the enzyme 
S'ciNI has restriction site GCGC. The enzymes cut double-strands 



5' — a — TCGA — — 3' 



and 



5' — 7 — G;CGC 



— 5—3' 

— 5 — 5' 



3' — a — AGCT — /3 — 5' 



3' — 7 — CGCiG 



along the dotted lines, respectively, leaving the first segment of the left strand with a sticky end 
GC which is compatible to the sticky end CG of the second segment of the right strand. The 



segments can be recombined to form cither the original strands or the new strand 



5' — a — T C G C — 5_ — 3' 
3' — a — AGCG — 5 — 5' ' 

A splicing system is a formal language model which consists of a set of initial words or axioms I 
and a set of splicing rules R. The most commonly used definition for a splicing rule is a quadruple 
of words r — vi] U2, ^2)- This rule splices two words xiuiviyi and X2U2V2y2- the words are cut 
between the factors wi, wi, respectively U2,V2, and the prefix (the left segment) of the first word is 
recombined by catenation with the suffix (the right segment) of the second word, see Figure 1 and 
also [18]. A splicing system generates a language which contains every word that can be obtained 
by successively applying rules to axioms and the intermediately produced words. 

Xl Ul Vl yi 
I 1 1 1 1 _^ \ \ \ I I 



Figure 1: Splicing of the words xiUiViyi and X2U2V2y2 by the rule r = (ui,vi;u2,V2)- 



Example 1.1. Consider the splicing system {I,R) with axiom / — {ab} and rules R ~ {r,s} 
where r = (a, b; e, ab) and s = {ab, e; a, b); in this paper, e denotes the empty word. Applying the 
rule r to two copies of the axiom ab creates the word aab and applying the rule s to two copies of 
the axiom ab creates the word abb. More generally, the rule r or s can be applied to words a'fo-' 
and a'^b^ with k,£ > 1 in order to create the word a*+^6^ or a'fe''+^, respectively. The language 
generated by the splicing system {I,R) is L{I,R) — a+6+. 

The most natural variant of splicing systems, often referred to as finite splicing systems, is 
to consider a finite set of axioms and a finite set of rules. In this paper, by a splicing system 
we always mean a finite splicing system. Shortly after the introduction of splicing in formal 
language theory, Culik II and Harju [6] proved that splicing systems generate regular languages, 
only; see also [12,17]. Gatterdam [7] gave (aa)* as an example of a regular language which cannot 
be generated by a splicing system; thus, the class of languages generated by splicing systems is 
strictly included in the class of regular languages. However, for any regular language L over an 
alphabet E, adding a marker 5 ^ S to the left side of every word in L results in the language bL 
which can be generated by a splicing system [11]; e.g., the language b{aa)* is generated by the 
axioms {b,baa} and the rule {baa,e;b,e). 

This led to the question of whether or not one of the known subclasses of the regular lan- 
guages corresponds to the class <S of languages which can be generated by a splicing system. All 
investigations to date indicate that the class S does not coincide with another naturally defined 
language class. A characterization of reflexive splicing systems using Schiitzenberger constants has 
been given by Bonizzoni, de Felice, and Zizza [1-3]. A splicing system is reflexive if for all rules 
(ui,vi; U2,V2) in the system we have that {ui,vi;ui,vi) and (m2, ^2; ^2,^2) are rules in the system, 
too. A word V is a Schiitzenberger constant of a language L if xivyi S L and X2vy2 € L imply 
Xivy2 € L [19]. Recently, it was proven by Bonizzoni and Jonoska that every splicing language 
has a constant [5]. However, not all languages which have a constant are generated by splicing 
systems, e. g., in the language L = {aa)* + b* every word 6* is a constant, but L is not generated 
by a splicing system. 

Another approach was to find an algorithm which decides whether a given regular language 
is generated by a splicing system. This problem has been investigated by Goode, Head, and 
Pixton [8,9, 13] but it has only been partially solved: it is decidable whether a regular language 
is generated by a reflexive splicing system. It is worth mentioning that a splicing system by the 
original definition in [10] is always reflexive. A related problem has been investigated by Kim [16]: 
given a regular language L and a finite set of enzymes, represented by set of reflexive rules R, Kim 
showed that it is decidable whether or not L can be generated from a finite set of axioms by using 
only rules from R. 
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In this paper we settle the decidability problem, by proving that for a given regular language, it 
is indeed decidable whether the language is generated by a splicing system (which is not necessarily 
reflexive). Corollary 5.2. More precisely, for every regular language L there exists a splicing system 
{Il,Rl) and if L is a splicing language, then L is generated by the splicing system {Il,Rl)- The 
size of this splicing system depends on the size of the syntactic monoid of L. If m is the size of the 
syntactic monoid of L, then all axioms in 1^ and the four components of every rule in Rl have 
length in 0{m'^), Theorem 4.1. By results from [12,13], we can construct a finite automaton which 
accepts the language generated by (/i, Rl), compare it with a finite automaton which accepts L, 
and thus, decide whether L is generated by a splicing system. Furthermore, we prove a similar 
result for a more general variant of splicing that has been introduced by Pixton [17], Theorem 3.1. 

The paper is organized as follows. In Section 2 we lay down the notation, recall some well- 
known results about syntactic monoids, and prove a pumping argument that is of importance 
for the proofs in the succeeding sections. Section 3 (Section 4) contains the proof that a regular 
language L is generated by a Pixton splicing system (resp. classical splicing system) if and only if 
it is generated by one particular Pixton splicing system (resp. classical splicing system) whose sice 
is bounded by the size of the syntactic monoid of L. Sections 3 and 4 can be read independently 
and overlap in some of their main ideas. The inclusion of both sections and the presentation 
order are chiefly for expository purposes: Due to the features of the Pixton splicing. Section 3 
introduces the main ideas in a significantly more readable way. Finally, in Section 5 we deduce 
the decidability results for both splicing variants. 

An extended abstract of this paper, including a shortened proof of Theorem 4.1 and Corol- 
lary 5.2 i.), has been published in the conference proceedings of DNA 18 in 2012 [15]. Theorem 3.1 
and Corollary 5.2 ii.) have not been published elsewhere. 

2 Notation and Preliminaries 

We assume the reader to be familiar with the fundamental concepts of language theory, see [14]. 

Let S be a finite set of letters, the alphabet; S* be the set of all words over E; and e denote 
the empty word. A subset L of S* is a language over S. Throughout this paper, we consider 
languages over the fixed alphabet S, only. Let w G S* be a word. The length of w is denoted 
by \w\. (We use the same notation for the cardinality \S\ of a set S, as usual.) We consider the 
letters of S to be ordered and for words u, v € S* we denote the length-lexicographical order by 
u <u v'l i- e., u <u V if either \u\ < \v\, or \u\ = \v\ and u is at most v in lexicographic order. The 
strict length-lexicographic order is denoted by <«; we have u <u v ii u <u v and u ^ v. 

For a length bound m G N we let E-™ denote the set of words whose length is at most m, i. e., 
= U,<™ S\ Analogously, we define E<™ = U»<™ 

li w = xyz for some x,y,z G E*, then x, y, and z are called prefix, factor, and suffix of w, 
respectively. If a prefix or suffix of w is distinct from it is said to be proper. 

Let w = ai . . .On where ai, . . . , a„ are letters from E. By w[i] for < i < n we denote a 
position in the word w: if i = 0, it is the position before the first letter ai, if i = n it is the 
position after the last letter a„, and otherwise, it is the position between the letters Oi and ai-|_i. 
We want to stress that w[i] is not a letter in the word w. By !(;[?; j] for < i < j < n we denote 
the factor Oi^i ■ ■ ■ Oj which is enclosed by the positions w[i] and w[j]. If a; = w[i; j] we say the 
factor X starts at position w[i\ and ends at position w[j]. Whenever we talk about a factor x of 
a word w we mean a factor starting (and ending) at a certain position, even if the the word x 
occurs as a factor at several positions in w. Let x = u'[i;j] and y — w[i';j'] be factors of w. We 
say the factors x and y match (in w) ii i = i' and j =^ j'; the factor x is covered by the factor y 
(in w) ii i' < i < j < j'; and the factors x and y overlap (in w) ii x ^ s, y ^ e, and i < i' < j or 
i' < i < j' . In other words, if two factors x and y overlap in w, then they share a common letter 
of w. Let X — w[i; j] be a factor of w and let p — w[k] be a position in w. We say the position p 
lies at the left of x ii k < i; the position p lies at the right of x ii k > j; and the position p lies in 
X Hi < k < j. 

Every language L induces an syntactic congruence over words such that u v if and only 
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if for all words x, y 

xuy G L xvy G L. 

The syntactic class (with respect to L) of a word u is [u\l = {v \ u ~l v}. The syntactic monoid 
of L is the quotient monoid 

Ml^T.*/^l = {[u]l I ugE*}. 

It is well known that a language L is regular if and only if its syntactic monoid Ml is finite. We 
will use two basic facts about syntactic monoids of regular languages. 

Lemma 2.1. Let L he a regular language and let w he a word with \w\ > \Ml\^ . We can factorize 
w — a/3 J with P ^ e such that a oij3 and 7 ~l /37. 

Proof. Consider a word w with n — \w\> For i = 0, . . . , n, let Xi — w[0; i] be the syntactic 

classes of the prefixes of w and let 1^ = n] be the syntactic classes of the suffixes of w. Note 
that XiYi = [w]l- By the pigeonhole principle, there are i, j with < i < j < n such that Xi = Xj 
and Yi ~ Yj. Let a — w[0; i], (3 = w[i]j], and 7 = w[j; n]. As a G and a/3 G Xj, we see that 
a ~i a/3 and, symmetrically, 7 ~l /37. □ 

Lemma 2.2. Let L he a regular language. Every element X G Ml contains a word x & X with 
\x\ < \Ml\. 

Proof. We define a series of sets Si C M^. We start with 5*0 = {1} (here, 1 = [s]l) and let 
Si+i = SiU{X ■ [a]L I X G S'i A a G S} for i > 0. It is not difficult to see that X £ Si ii and 
only if X contains a word x ^ X with < i. As Si C Si+i and AIl is finite, the series has a 
fixed point Sn such that Si = Sn for all i > n. Let n be the least value with this property, i. e., 
Sn-i C 5'n or n = 0. Observe that n < \Ml\ as 6*0 C 5*1 C • • ■ C 5'„. Every element X G Ml 
contains some word w €z X, thus, X G S^^i^ C 5„. Concluding that X contains a word with a 
length of at most n < \Ml\. □ 

2.1 A Pumping Algorithm 

Consider a regular language L, a word a/37 where a '-^l a/3 and 7 /37, due to Lemma 2.1, and 
a large even number j. In the proofs of Theorem 3.1 and Lemma 4.8, we need a pumping argument 
to replace all factors a/37 by a/3^7 in a word z in order to obtain a word z; thus, z ~i z. As a/3"/ 
may be a factor of a/3^"f, we cannot ensure that a/3j is not a factor of z. However, we can ensure 
that if a/37 = z[k; k'] is a factor of z, then either (a) a/3^/'^ is a factor of z starting at position z[k] 
or (b) /3^/^7 is a factor of z ending at position z[fc']; i. e., either a is succeeded by a large number of 
/3's or 7 is preceded by a large number of /3's. The next lemma is a technical result whose purpose 
is to assure that for any word z there exists a word z such that the above-mentioned property 
holds and z is generated by applying several successive pumping steps af3^ 1— > a/3-' 7 to z. 

Lemma 2.3. Let z,a,/3,j he words with /3 ^ e, let i = \a/3"f\, and let j > |z| + ^ be an even 
numher. The following algorithm will terminate and output z. 

1. z := z; 

2. if z[k]k + £] = a/37 /c"" some k such that neither 

(a) a/3^^'^ is a factor of z starting at position z[k] nor 
(h) /3-'/^7 is a factor of z ending at position z[k + 

then let z :— z[0; k] ■ a/3^"f ■ z[k + £; |z|]; (replace the factor z[k; k + £] = a/3j in z by a/3^"f) 

3. repeat step 2 until there is no such factor a/3j in z left. 
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Before we prove Lemma 2.3, let us recall a basic fact about primitive words. A word p is called 
primitive if there is no word x and i > 2 such that p — . The primitive root of a word w ^ e is 
the unique primitive word p such that w = p^ for some i > 1. For primitive p, it is well known 
that if pp — xpy, then either x = p and y — e, or x — s and y — p. In other words, whenever p is 
a factor of p" starting at position then i G |p| • N. 

For a word w = xy we employ the notations x^^w = y and wy~^ = x. li x is not a prefix of 
w {y is not a suffix of w), then the x~^w (resp. wy~^) is undefined. 

Proof of Lemma 2.3. Let p be the primitive root of /? and let to such that (5 = p™. 

First, observe that if, during the computation, a factor a/37 — z[k]k + P\ is covered by a factor 
a(3^^ in z, then either (a) or (b) holds. Indeed, if a/3"f = {al3^j)[i;i + £] for some i, then /3 is a 
factor of (3^ starting at position 13^ [i]. As mentioned above, i G |p| • N and either position /3^[i] is 
preceded or succeeded by p^'^l"^ = /J^/^. Therefore, (a) or (b) is satisfied. 

Let zo = let be the word z after the n-th pumping step in the algorithm, and let 
y — _ Pqj, gach n, we will define a unique factorization 

— '2'n,0y*^n,l ' ' ' H-^n^n 

where p is a suffix of Xn^i for i — 0, . . . , n — 1 and p is a prefix of Xn,i for i — 1, . . . , n. This 
factorization is defined inductively: naturally, we start with xq.o — zq ~ z. Assume Zn is factorized 
in the above manner. Let a/3j = Zn[k;k + £] be the factor, such that neither (a) nor (b) holds, 
which we replace in the {n + l)-st step (if there is no such factor, the algorithm terminates and 
we do not have to define Zn+i). By contradiction, assume that a starting at position Zn[k] is 
covered by the z-th factor y = p^ J^^ -j-j^^ factorization of Zn for some 1 < i < n. By the first 
observation, the factor /3j — Zn[k + \a\]k + £] must overlap with Xi. However, as p is a prefix of 
Xi, the factor /3 — Zn[k + \a\ ; k + \al3\] has to cover the prefix p of Xi or it has to cover one of 
the p's in y. This implies that 7 is preceded by p™ J/2 = /JJ/^ and (b) holds — contradiction. 
Symmetrically, 7 is not covered by one of the factors y in z„ neither. 

Thus, f3 — Zn[k + |q;| ; A: + |q;/3|] is covered by some Xn,i in the factorization of Zn and Xn.i can 
be factorized Xn.i = ufiv where u ^ e and v ^ e. Note that the length of Xn.i has to be at least 

+ 2. Now, let Xn+i,h = Xn,h for /i = 0, . . . , i - 1, let Xn+i.h+i = Xn,h for /i = i + 2, . . . , n, let 
Xn+i,i — up, and let Xn+i.i+i = pv. Observe that this defines the desired factorization. Also note 
that 

|x„+i,i| = \u\ + IpI = \Xn^i\ - |/?| - \v\ + \p\ < \Xn,i\ - \v\ < \x„^i\ 

and, symmetrically, |a;„_|-i^i+i | < \xn,i\. Thus, in each pumping step, we replace one of the factors 
Xn,i by two strictly shorter factors Xn+i.i and Xn-^-i^i+i. As we have noted above, in a factor XnA 
cannot be pumped anymore, if it is shorter than \f3\ + 2. Eventually, all the the factors will be too 
short and the pumping algorithm will stop. □ 

3 Pixton's Variant of Splicing 

In this section we use the definition of the splicing operation as it was introduced in [17]. A triplet 
of words r = (wi, M2; v) G (E*)^ is called a (splicing) rule. The words ui and U2 are called left and 
right site of r, respectively, and v is the bridge of r. This splicing rule can be applied to two words 
wi = xiUiyi and W2 = X2U2y2, that each contain one of the sites, in order to create the new word 
z = Xivy2, see Figure 2. This operation is called splicing and it is denoted by (wi,W2) l^r z. 

XI ui yi xi ui vi 
I —\ I I I ^-|^..^...f...?A...| 



I \ \ 1 I f ^^^1 1 

X2 U2 J/2 X2 U2 J/2 

Figure 2: Splicing of the words xiUiyi and X2U2y2 by the rule r — {ui,U2',v). 
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For a rule r we define the splicing operator ar such that for a language L 



<Jr{L) = {z G S* I 3wi,W2 e L: {wi,W2) z} 



and for a set of splicing rules R, we let 



The reflexive and transitive closure of the splicing operator f7|j is given by 



liL) = L. 



■r{L) = U '^fl 



A finite set of axioms / C S* and a finite set of splicing rules R C (S*)-^ form a splicing system 
{I,R). Every splicing system {I,R) generates a language L{I,R) = cr|j(/)- Note that L{I,R) is 
the smallest language which is closed under the splicing operator aji and includes /. It is known 
that the language generated by a splicing system is regular, see [17]. A (regular) language L is 
called a splicing language if a splicing system (/, R) exists such that L = L{I, R). 

A rule r is said to respect a language L if ar{L) C L. It is easy to see that for any splicing 
system {I,R), every rule r £ R respects the generated language L{I,R). Moreover, a rule r ^ R 
respects L{I, R) if and only if L(/, R U {r}) — L{I, R). We say a splicing (wi, W2) l^r 2^ respects a 
language i if wi,u'2 G i and r respects i; obviously, this implies z £ L, too. 

Pixton introduced this variant of splicing in order to give a simple proof for the regularity of 
languages generated by splicing systems. As Pixton's variant of splicing is more general than the 
classic splicing, defined in the introduction and in Section 4, his proof of regularity also applies 
to classic splicing systems. For a moment, let us call a classic splicing rule a quadruple and a 
Pixton splicing rule a triplet. Consider a quadruplet r = (ui, wi; W2, ^2)- It is easy to observe that 
whenever we can use r in order to splice wi = xiuiviyi with W2 = X2U2V2y2 to obtain the word 
z = xiUiV2y2, we can use the triplet s = (uiWi, U2W2; in order to splice (wi, W2) l~s z as well. 
However, for a triplet s = {ui,U2',v) where v is not a concatenation of a prefix of ui and a suffix 
of U2, there is no quadruplet r that can be used for the same splicings. Moreover, the class of 
classical splicing languages is strictly included in the class of Pixton splicing languages; e.g., the 
language 



over the alphabet {a,b,c,d,e, f,x} is a Pixton splicing language but not a classical splicing lan- 
guage, see [4]. For the rest of this section we focus on Pixton's splicing variant and by a rule we 
always mean a triplet. 

The main result of this section states that if a regular language L is a splicing language, then 
it is created by a particular splicing system (J, R) which only depends on the syntactic monoid of 
L. 



Theorem 3.1. Let L be a splicing language and m = I-^^lI- The splicing system {I,R) with 
I = +6™ n L and 



generates the language L = L{I,R). 

As the language generated by the splicing system (/, i?) is constructible. Theorem 3.1 implies 
that the problem whether or not a given regular language is a splicing language is decidable. A 
detailed discussion of the decidability result is given in Section 5. 

Let L be a formal language. Clearly, every set of words J C L and set of rules S where every 
rule in S respects L generates a subset L{J,S) C L. Therefore, in Theorem 3.1 the inclusion 
L{I, i?) C i is obvious. The rest of this section is devoted to the proof of the converse inclusion 



L = cx*ae + cx*be + dcx*bef 




6 



L C L{I,R). Consider a splicing language L. One of the main techniques we use in the proof 
is that, whenever a word z is created by a series of splicings from a set of words in L and a set 
rules that respect L, then we can use a modified set of words from L and modified rules which 
respect L in order to obtain the same word z by splicing. If z is sufficiently long these words can 
be chosen such that they are all shorter than z and the sites and bridges of the rules also satisfy 
certain length restrictions. Of course, our goal is to show that we can create z by splicing from a 
subset of / with rules which all satisfy the length bounds given by R (as defined in Theorem 3.1). 
In Section 3.1 we will present techniques to obtain rules that respect L from other rules respecting 
L and we show how we can modify a single splicing step, such that the words used for splicing 
are not significantly longer than the splicing result. In Section 3.2 we use these techniques to 
modify series of splicings in the way described above (Lemma 3.8). Finally, in Section 3.3 we 
prove Theorem 3.1. 

3.1 Rule Modifications 

Let us start with the simple observation that we can extend the sites and the bridge of a rule r 
such that the new rule respects all languages which are respected by r. 

Lemma 3.2. Let r — {ui,U2',v) be a rule which respects a language L. For every word x, the 
rules {xui,U2]xv), {uix,U2',v), {ui,xu2',v), and {ui,U2x;vx) respect L as well. 

Proof. Let s be any of the four rules {xui,U2;xv), {uix,U2;v), (iti,xu2;w), or {ui,U2x;vx). In 
order to prove that s respects L we have to show that, for all wi,W2 G L and z £ E* such that 
{wi,W2) we have z € L, too. Indeed, if {wi,W2) z, then {wi,W2) z and as r respects 

L, we conclude z £ L. □ 

Henceforth, we will refer to the rules {xui, U2', xv) and (ui, U2x; vx) as extensions of the bridge 
and to the rules (uix,W2;w) and {ui,xu2'tv) as extensions of the left and right site, respectively. 

Next, for a language i, let us investigate the syntactic class of a rule r — {ui,U2;v). The 
syntactic class (with respect to L) of r is the set of rules [r]^ = [ui]l x [u2]l x and two rules 
r and s are syntactically congruent (with respect to L), denoted by r s, if s G [rj^. 

Lemma 3.3. Let r be a rule which respects a language L. Every rule s G [r]^ respects L. 

Proof. Let r — (ui,U2',v) and s = (ui,U2',v). Thus, Ui Ui for z = 1,2 and v v. For 
wi = xiUiyi G L and W2 = X2U2y2 G L we have to show that z = Xivy2 G L. For i = 1,2, let 
Wi = XiUiyi and note that Wi '^l Wi] hence, Wi G L. Furthermore, {wi, W2) Xivy2 = z G L a,s r 
respects L and z £ L as z z. □ 

Consider a splicing (xiuij/i, X2U2y2) xivy2 which respects a regular language L as shown 
in Figure 3 left side. The factors uiyi and a;2M2 may be relatively long but they do not occur as 
factors in the resulting word Xivy2. In particular, it is possible that two long words are spliced 
and the outcome is a relatively short word. Using Lemmas 3.2 and 3.3, we can find shorter words 
in L and a modified splicing rule which can be used to obtain Xivy2. 



Lemma 3.4. Let r ~ (ui,U2',v) be a rule which respects a regular language L and wi — xiuiyi G 
L, W2 = X2U2y2 G L. There is a rule s = {ui,U2',v) which respects L and words wi — xiui G L, 
W2 — U2J/2 G L such that \ui\ , |u2| < \Ml\. More precisely, ui G [wiyiji and U2 G [x2U2]l- 

In particular, whenever {wi,W2) \~r Xivy2 = z, there is a splicing (wi,W2) z which respects 
L where wi, 'W2, and s have the properties described above. 





Figure 3: The factors uiyi and 2:2^2 can be replaced by short words. 
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Proof. By Lemma 3.2, the rule {uiyi,X2U2',v) respects L. Choose ui € and U2 S [x2U2]l 

as shortest words from the syntactic classes, respectively; as such, , \u2\ < \Ml\ (Lemma 2.2) 
and wi = xiui & L, W2 = "22/2 G L. Furthermore, by Lemma 3.3, s = {ui,U2',v) respects L. □ 

Another way of modifying a splicing (wi, W2) \~'r z is to extend the bridge of r to the left until 
it covers a prefix of wi. Afterwards, we can use the same method we used in Lemma 3.4 and 
replace wi by a short word, see Figure 4. As the splicing operation is symmetric, we can also 
extend the bridge of r rightwards and replace W2 by a short word, even though Lemma 3.5 does 
not explicitly state this. 



^2 "2 S/2 




Figure 4: The word xiUiyi can be replaced by a short word as long as we extend the bridge of 
the splicing rule accordingly. 



Lemma 3.5. Let r — {ui,U2;v) be a rule which respects a regular language L and let wi = 
xiUiUi £ L. Every rule s — (w,M2;xiv), where w £ [wi\l '~= L, respects L. In particular, there is 
a rule s, as above, where \w\ < \Ml\. 

Proof. By Lemma 3.2, we see that (xiUiyi, U2; xiv) respects L and, by Lemma 3.3, s = {w, U2] xiv) 
respects L. If w £ [wi\l is a shortest word from the set, then \w\ < \Ml \ by Lemma 2.2. □ 



3.2 Series of Splicings 

We are now investigating words which are created by a series of successive splicings which all 
respect a regular language L. Observe, that if a word is created by two (or more) successive 
splicings, but the bridges of the rules do not overlap in the generated word, then the order of these 
splicings is irrelevant. The notation in Remark 3.6 is the same as in the Figure 5. 

Xl ni vi 

^ w' uL y2 

X2 U2 f.'^ 

I f 1 

X3 U3 ys 



Figure 5: The word Xiviw'v2y3 can be created either by using the right splicing first or by using 
the left splicing first. 

Remark 3.6. Consider rules r = (iti,M2;wi) and s (it2,M3;f2) and words wi — xiUiyi, W2 = 
X2U2Uj'u2y2, and = x^usy^. The word z ~ Xiviw'v2y3 can be obtained by the splicings 

(wi,W2) Xiviw'u2y2 — z' , {z'^w^) \- g z as well as 

(W2,W3) l-s X2U2w'v2y3 = z" , (wi^z") h,. Z, 

which makes the order of the splicing steps irrelevant. 

Now, consider a word z which is created by two successive splicings from words Wi = XiU^yi 
for i = 1, 2, 3 as in Figure 6. If no factor of wi or of the bridge in the first splicing is a part of z, 
then we can find another splicing rule s such that (103,^2) l^s z and the bridge of s is the bridge 
used in the second splicing. 

Lemma 3.7. Let L be a language, Wi = XiUiyi e L for i = 1,2,3, and ri = (ui,U2;wi), ^2 = 
{u3,U4;v2) be rules respecting L. If there are splicings 

(W1,W2) \-ri XiViy2 = W4 = XiUiyi, iw3,W4) \-r2 X3V2yA = Z 

where 2/4 is a suffix 0/2/2; then there is a rule s — {u3,U2',V2) which respects L and (w3,W2) z. 
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+ 





Figure 6: Two successive splicings can be replaced by one splicing in the case when the factor Xi 
and the bridge vi do not contribute to the resulting word. 

Proof. By extending the bridge vi of ri and the right site U4 of r2 (Lemma 3.2), we may assume 
the factors vi and U4 match in W4: let ?i;4[i. j'] = vi and W4[i',j'] — M4, 

• ii i < i' we extend U4 in r2 to the left by i' — z letters, 

• ii i > i' we extend vi in ri to the left hy i — i' letters and we extend ui accordingly, and 

• we extend vi in ri to the right by j' — j letters and we extend U2 accordingly; Note that 
j' > j as 2/4 is a suffix of 2/2- 

Clearly, the extended factors vi and U4 match in w^. The left site M3 and the bridge V2 of r2 are 
not modified by this extension. Additionally, we have xi = X4 and y2 =2/4. Let s = (m3,W2;w2) 
(where U2 is the extended right site of ri). As desired, {w3,W2) l~s x^V2yA = z since W2 = X2U2yA- 
Next, let us prove that s respects L. Let w[ = x'^Uiy'^ e L for i = 2, 3. If for all those words 
^3^22/2 ^ then s respects L. Indeed, we may splice 



Consider a splicing system ( J, S) and its generated language L = L{J, S). Let n be the length 
of the longest word in J and let fi be the length-lexicographic largest word that is a component 
of a rule in S. Define = {w S E* | w <u /^} as the set of all words that are at most as large 
as /i, in length-lexicographical order. Furthermore, let / = n i be a set of axioms and let 



be a set of rules. It is not difficult to see that J C /, C i?, and L — L{I,R). Whenever 
convenient, we may assume that a splicing language L is generated by a splicing system which is 
of the form of (/, R). 

Now, consider the creation of a word xzy G L by splicing in {I,R). The creation of xzy can 
be traced back to a word zi — xizyi where either zi G / or where zi is created by a splicing that 
affects z, i.e., the bridge in this splicing overlaps with the factor z in xizyi. The next lemma 
describes this creation of xzy = Zk+i by k splicings in (/,i?), and shows that we can choose the 
rules and words which are used to create Zk+i from zi such that the words and bridges of rules 
are not significantly longer than £ = max{|a;| , |y|}. 

Lemma 3.8. Let L be a splicing language, let £,n £N, let m = \Ml\, and let fi be a word with 
1^1 > £ + 2m such that for I = S-" n L and R ~ {r £ | r respects L] we have L — L{I, R). 

Let Zk+i = Xk+izyk+i, with |a;fc_|_i| , |2/fe-i-i| < £, be o, word that is created by k splicings from a 
word z\ — x\zy\ where either z\ ^ L or z\ is created by a splicing {wq, Wq) zi with wq, Wq G L, 
s G i?, and the bridge of s overlaps with z in zi. Furthermore, for i = 1, . . . , fc the intermediate 
splicings are either 

(i) {wi,Zi) Xi+izyi^i = Zi^i, Wi E L, ri <E R, yi+i — yi, and the bridge of ri is covered by 
the prefix Xi+i or 

(ii) {zi,Wi) Xi+izyi+i — Zi+i, Wi G L, r-i G R, Xi^i — Xi, and the bridge of ri is covered by 
the suffix yi+i . 



(^1,^2) \-ri XiViy'2 = XiUiy2, 



{w'^,XiUiy2) X3V2y'2- 



Therefore, x'^V2y2 G L and s respects L. 



□ 



R—{r^ Wfi^ I r respects i} 
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There are rules and words creating Zk+i, as above, satisfying in addition: 

1. There is k' < k such that for i — l,...,fc' all splicings are of the form (i) and for i — 
A:' + 1, . . . , fc all splicings are of the form (ii). 

2. For i = 1, . . . , fc the following hounds apply: \xi\ , \yi\ < I -\- 2m, \wi\ < m, ri ^ j^<2m ^ 
In particular, if n > m, then wi, . . . ,Wk € I. 

Proof. Statement 1 follows by Remark 3.6 Note that if fc = 0, then statement 2 is trivially true. 
By the first statement, Xk'+i = Xk'+2 = ■ ■ ■ = Xk+i and yi = y2 = ■ ■ ■ = yk'+i- Let us consider 
the splicings of the form (i) which arc the steps i — 1, . . . , fc'. The notations we employ in order 
to prove the second statement for i — 1, . . . ,k' are chosen to match the notations in Figure 7. 



Wi 



4 



Figure 7: The i-th splicing step in the proof of Lemma 3.8 where Vi ~ Ui+i(5,;+i and Xi^i = 

Uz+lSi+ix'^. 

Let ri = {wi,Ui;Vi) where Wi € j]^'" n L (Lemma 3.5) and Xi — Uix[; (by Lemma 3.2, we 
extended the site Ui to cover a prefix of Xi) such that Ui+ix'^^^ — Vix[ with Uk'+i = £ and 
■^'fc'+i ~ Xk'+i = Xk+i- Lemma 3.7 justifies the assumption that every splicing occurs at the left 
of the preceding splicing, i.e., a;- is a proper suffix of x[_^-^. Note that, as |a;J;/_|_i| < ^, the length 
of x'^ is bounded by Now, choose 5i+i such that x[j^i = 5i+ix[\ thus, Ui+i5i+i — Vi. 

For i = 2, . . . , fc' we replace Ui by a shortest word from [u^Jl. Note that this does not change the 
fact that all rules respect L (Lemma 3.3). We also replace the prefix of Xi and Vi-i by this factor. 
(There is no need to change Vk' as \vk'\ = \Sk'+i\ < \xk+i\ < i-) Therefore, \xi\ < |a;-| +m < £ + m 
and r, e I]<™ x E^™ x i;<^+™ [f i ^ I (Lemma 2.2). We do not change ui yet as this may 
affect the splicing {wq, Wq) zi if it exists. Note that, for i = 2, . . . ,k' , we have actually proven 
a stronger bound than claimed in statement 2 of Lemma 3.8. Even though we have not proven 
the bound for ri yet, we have already established ri £ E^™ x S* x S^^^™. Symmetrically, we 
can consider statement 2 to be proven for z = fc' + 2, . . . , fc, i. e., only the prefix xi and the suffix 
yi — Uk'+i have not been modified yet. 

Now, let xi = uix[ (as above) and, symmetrically, let j/i = where Uk'+i is the 

left site of rk'+i- If fc' = (or fc' = fc), then ui (resp. Uk'+i) can be considered empty and 
x[ = Xk+i (resp. y'k'^i — Vk+i)- If G / we replace ui and Ufc'+i by shortest words from their 
syntactic classes, respectively, and the claim holds. Otherwise, {wq, Wq) hs zi where s = {uq, Uq, v), 
Wq — xuq, and w'q = u'^y, by Lemma 3.4. Thus, 

zi = uix'izy[,^-^uk'+i = xvy. 

In the case when v does not overlap with the prefix ui of zi, replace ui by a shortest word 
from its syntactic class. If v and the prefix ui overlap, let ui — 6162 such that S2 is the overlap 
and replace Si and 62 by a shortest word from their syntactic classes, respectively. In both cases, 
\ui \ < 2m (Lemma 2.2) and if v was modified, it got shorter; hence, we still have v £ W^. Observe 
that |a;i| < l + 2m and ri G E<™ x i;<2™ x x;<^+™. Analogously, Uk'+i and rk'+i can be treated 
in order to conclude the prove of statement 2. □ 

3.3 Proof of Theorem 3.1 

Let L be a splicing language and m = \Ml\. Throughout this section, by ~ we denote the 
equivalence relation and by [•] we denote the corresponding equivalence classes 
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Recall that Theorem 3.1 clanTis that the splicmg system (/, R) with / = +6"^ n L and 



<2m ^ Yi<2m ^ yi<m +10m 



i? = r e S^-'" X S^-^" X E 



r respects l| 



generates L. The proof is divided in two parts. In the first part, Lemma 3.9, we prove that L is 
generated by a splicing system (/, R') where all sites of rules in R' are shorter than 2m, but we 
do not care about the lengths of the bridges. The second part will then conclude the proof by 
showing that there are no rules in R' with bridges of length greater than or equal to + 10m 
which are essential for the creation of the language L by splicing. 

Lemma 3.9. Let L, m, and I as above. There is n £ N such that the splicing system (/, R') with 

R' ^{re S<2" X S<2™ X S^" I r respects L} 
generates the language L ~ L{I,R'). 

Proof. As / C L and every rule in R' respects L, it is clear that L{I, R') C L for any n; we only 
need to prove the converse inclusion. 

As L is a splicing language, L — L{J, S) for some splicing system (J, S). Let n be larger than 
the length of every bridge of every rule in S and n > 4m^ . 

In order to prove L C L{I, R') we use induction on the length of words in L. For all w G L 
with \w\ < m? + 6m, by definition, w £ I <Z L{I, R'). 

Now, consider w £ L with \w\ > m^ + 6m. The induction hypothesis states that every word 
w' Q L with < |w| belongs to L{I,R'). Factorize w = xa^jSy such that , |?/| = 3m, 
\al3j\ — m^, \f3\ > 1, a a/?, and 7 ^ Pj. 

The proof idea is to use a pumping argument on a/Sj in order to obtain a very long word. 
This word has to be created by a series of splicings in (J, S). We show that these splicings can be 
modified in order to create w by splicing from a set of strictly shorter words and with rules from 
R'. Then, the induction hypothesis implies w S L{I,R'). 

Choose j sufficiently large {j > n and J does not contain words of length j or more) . We let 
z = a/3^"fS and investigate the creation of xzy G L. As z is not a factor of a words in J, every word 
in L which contains z is created by some splicing in {J, S). Thus, we can trace back the creation of 
xzy by splicing to the point where the factor z is affected for the last time. Let Zk+i = Xk+izyk+i, 
where Xk+i = x and yk+i = y, be created by k splicings from a word zi = xizyi where xizyi is 
created by a splicing (wo,w'q) h^, zi with wq, w'q £ L, s d S, and the bridge of s overlaps with z in 
zi. Furthermore, for « = 1, . . . , fc the intermediate splicings are either 

(i) {wi,Zi) \-r, Xi+izyi+i = Zi+i, Wi e L, r^ € S, yi+i = y^, and the bridge of is covered by 
the prefix Xi+i or 

(ii) (zi,Wi) hr- Xi+izyi+i — z^+i, Wi G i, r.i G S, Xi+i — Xi, and the bridge of is covered by 
the suffix yi+i. 

Following Lemma 3.8 (with I = 3m), we may assume that wi, . . . ,Wk G /, ri, . . . , G E^^™ x 
Y^<2m ^ j^<4m^ -(-]^^g ri , . . . , Tfe G i?', and [xi | , |yi | < 5m. Furthermore, we may use the same words 
and rules in order to create w = Xk+ia/SjSyk+i from xiaf3jSyi by splicing, i.e., if xia/SjSyi 
belongs to L{I, R'), so does w. 

Now, consider the first splicing {wo,w'q) zi = xizyi. By Lemma 3.4, we assume s = 
{ui,U2]v) such that wq = xui, w'q = U2y and , \u2\ < m {x and y are newly chosen words). 
Hence, 

zq — xvy — Xiaj3^ ^5yi. 

where a; is a proper prefix of Xiaj3^^5 and y is a proper suffix of a/3^jSyi. 

We will now pump down the factor to /3 in order to obtain the words x, v, y from x, v, y, 
respectively, as follows: 

1. If V overlaps with but does neither cover a nor 7, extend v (Lemma 3.2) such that 
V — a/S^j. Observe that, now, the factor a/S^j is covered by either xv or vy. 
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2. If aP^ or /3-'7 is covered by one of x, v, or y, then replace this factor by a/3 or /37, respectively. 
Otherwise, by symmetry, assume that a/3-' 7 is covered by xv and, therefore, we can factorize 

a; = a;ia/3^'i/3i = /32/3^'^7w' 

where /3i/32 = /3 and ji + J2 + 1 = The results of pumping are the words x — a;ia/3i, v = I32^v', 
and y = y. 

Let ui and be the sites of s that may have been altered due to the extension of v and, 
by Lemma 3.4, assume , |u2| < m- If we used an extension for t;, then \v\ = m^. No matter 
whether we used an extension, t = (ui, M2; v) G i?' and (iui, ■U2y) xial3^5yi as desired. Observe 
that 5; is a prefix of xia(3^5 and y is a suffix of a(3j6yi and recall that \xi \ ,\yi\ < 5m. Therefore, 
\xui\ , |w2y| < \a(5jd\ + 6m = \w\ and, by induction hypothesis, xui and -U2y belong to L{I,R'). 
We conclude that xia/37(5j/i as well as w belong to L{I,R'). □ 

We are now prepared to prove the main result. 

Proof of Theorem 3.1. Recall that for a splicing language L with m = |Mi|, we intend to prove 
that the splicing system (/, R) with / = l]<™'+6™ n L and 



<2?7i ^ ■^<2m ^ x^<77i +10m 



R=lre X X E 



r respects L 



generates the language L — L{I,R). Obviously, L{I,R) C L. By Lemma 3.9, there is a finite set 
of rules R' C E<2™ x I]<2" x E* such that = L. 

For a word we let Wfj, = {w €Tj* \ w <u 1^}, as we did before. Define the set of rules where 
every component is length-lexicographically bounded by 

R^^{re E<2'" X E<2™ xWf,\ r respects L} 

and the language — L{I,R^)] clearly, C L. For two words fi <u v we see that R^ C i?^, 
and hence, C Ly. Thus, ii L^^ — L for some word /i, then for all words v with /i <u v, we 
have Ly = L. As L — L{I,R'), there exists a word fi such that = L. Let be the smallest 
word, in the length-lexicographic order, such that = L. Note that if < m^ + 10, then 
R^ C _R and L — C L{I,R). For the sake of contradiction assume > m? + 10m. Let v 
be the next-smaller word than /i, in the length-lexicographic order, and let S — R^. Note that 
L{I, S) C. L and i?^ \ S contains only rules whose bridges are /i. 

Choose w from L \ L{I,S) as a shortest word, i.e., for all w' ^ L with \w'\ < \w\, we have 
w' G 5). Factorize w = xzy with = \y\ = 3m; note that |w| > m^ -I- 6m since, otherwise, 
w € I. Factorize fj, = SiaP^S2 with |(5i| ,\S2\ > 5m, |a/37| = m^, /3 ^ e, a ^ a/3, and 7 ^ /37, by 
Lemma 2.1. 

Next, we will use a pumping argument on all factors a/37 in z. As in the proof of Lemma 3.9, 
this new word has to be created by a series of splicings in (/, i?^) and we will show that these 
splicings can be modified in order to create w from strictly shorter words and with rules from S. 
This will contradict the assumption that w is a shortest word from L\ L{I, S). 

Let j be a sufficiently large even number (j > 4 -I- |z| will do). We define a word z which is 
the result of applying the pumping algorithm from Lemma 2.3 on z, as discussed in Section 2.1. 
The pumping algorithm replaces the occurrences of a/37 in z by a/3^7 such that for every factor 
z[k,k + m?] = a/37, either 

(a) is a factor of z starting at position z[k] or 

(b) /3''/^7 is a factor of z ending at position z[k + m?] 

holds. In particular, if 5iaf3^52 is a factor of z either (a) 7(^2 is a prefix of a word in /3+ or (b) 
5ia is a suffix of a word in /S"*". By induction and as a/37 ~ a/3^7, it is easy to see that z ^ z and 
xzy e L. 

Let us trace back the creation of xzy S L by splicing in (/, R^) to a word xizyi where either 
xizyi £ I or where xizyi is created by a splicing that affects z. Let Zk+i — Xk+izyk+i, where 
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Xk+1 = X and ijk+i = y, be created by k splicings from a word zi = xizyi where either xizyi € I 
or xizyi is created by a sphcing (wo,Wq) zi with wqjWq L, s £ i?^, and the bridge of s 
overlaps with z. Furthermore, for i — \, . . . the intermediate sphcings are either 

(i) {wi,Zi) Xi+izyi+i = z^+i, Wi G L, r, £ R^, y,+i = yi, and the bridge of is covered by 
the prefix Xi+i or 

(ii) {zi,Wi) Xi+izy^+i = Zi+i, Wi e L, r, € Rf^, x^+i = x^, and the bridge of r, is covered by 
the suffix yi+i- 

FoUowing Lemma 3.8 (with £ = 3m), we may assume that wi, . . . ,Wk G /, ri, . . . , G E<^™ x 
Y^<2m ^ j^<4m^ thus Ti , . . . , Tfc S S*, and |xi| , |yi| < 5m. Furthermore, we may use the same words 
and rules in order to create w — Xk+izyu+i from Xizyi by splicing. As w does not belong to 
L{I, S), the word xizyi must not belong to L{I, S) either. If zi was in /, then xizyi G / as well, 
as z is at most as long as z. 

Therefore, zi is created by a splicing {wo,Wq) zi where s = (mi,W2;w), wq = xui, and 
w'q = U2y where , \u2\ < by Lemma 3.4 (here, x and y are newly chosen words). We have 

zi — Xizyi — xvy 

where x is a proper prefix of xiz and y is a proper suffix of zyi. Recall that either s S or v ~ fi. 

However, we will see next that if w = /i, there is also a rule s £ S and slightly modified words 
which can be used in order to create xizyi by splicing. In this case fj, = Sia/3"fS2 is a factor of zi. 
As \5i \ ,\52 \ > 5m > |xi| , \yi\, the factor a(3j is covered by z and, as such, the pumping algorithm 
ensured that either (a) a is succeeded by /3^/^ or (b) 7 is preceded by 13^^^. Due to symmetry, we 
only consider the former case, in which ^62 is a prefix of a word in 13^ . Let us shorten the bridge 

V such that s = (ui,U2',Sia'jS2)- Note that s £ S (as a ~ a/3 and by Lemma 3.3). Furthermore, 
as j is large enough, y — (32(3^y' where (32 is the suffix of /? such that 7(52/32 G and £ > |7|. 
Note that this implies /327 is a prefix of y, which allows us to add an additional (3. Therefore, 
{wQ,U2l32l3^^^y') zi where U2/32/3^~^^y' S L. This observation justifies the assumption that 

V ^ fi and s G S which wc will make for the remainder of the proof. 

Next, we will pump down the factors a/3^"f to af3^ in z again. At every position where we 
pumped up before, we are now pumping down (in reverse order) in order to obtain the words x, 
V, y from the words x, v, y, respectively. The pumping in each step is done as in the proof of 
Lemma 3.9: 

1. li V overlaps with /3^ (in the factor that we are pumping down) but it neither covers a nor 
7, extend v (Lemma 3.2) such that v = a/3^j. Observe that, now, the factor a/3^"f is covered by 
either xv or vy. 

2. If a (3^ or f3^j is covered by one of x, w, or y, then replace this factor by a (3 or l3"/, respectively. 
Otherwise, by symmetry, assume that a/3-' 7 is covered by xv and, therefore, we can factorize 

x = x'aP'^Pi V ^ f32f3^^-iv' 

where /3i/32 = (3 and ji + J2 + 1 = j- The results of pumping are the words x'a/3i, /327w'. 

Let ui and U2 be the sites of s that may have been altered due to extensions and, by Lemma 3.4, 
assume |ui| ■,\u2\ < m. If we used an extension for v in at least one of the steps, then |?;| < m?. 
No matter whether or not we used an extension, t — {ui,U2',v) G S and (xui,U2y) xizyi. As 
\xui\ , |'S2y| < 1^1 + 6m — \w\, xui and U2y belong to L(/, 5). We conclude that xizyi as well as 
w belong to L{I, S) — the desired contradiction. □ 

4 The Case of Classical Splicing 

In this section, we consider the splicing operation as defined in [18]. This is the most com- 
monly used definition for splicing in formal language theory. The notation we use has been 
employed in previous papers, see e.g., [2,9]. Throughout this section, a quadruplet of words 
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r = {ui,vi;u2,V2) G (S*)^ is called a (splicing) rule. The words uiVi and U2V2 are called left and 
n^/ii site of r, respectively. This splicing rule can be applied to two words wi = xiuiviyi and 
W2 = X2U2V2y2, that each contain one of the sites, in order to create the new word z = xiUiV2y2, 
see Figure 8. This operation is called splicing and it is denoted by {wi,W2) l^r z. The splicing 
position of this splicing is 2;[|xiUi|]; that is the position between the factors xiui and ^2^2 in z. 



yi 



y2 



X\ Ml 



■V2 



y2 



Figure 8: Splicing of the words xiUiViyi and X2U2V2y2 by the rule r ~ {ui,vi;u2,V2)- 
Just as in Section 3, for a rule r we define the splicing operator ar such that for a language L 

(Jr{L) = {z G S* I 3wi,W2 G L: {wi,W2) z} 

and for a set of splicing rules R, we let 

aniL) = U aAL). 
The reflexive and transitive closure of the splicing operator (t|j is given by 

i>0 

A finite set of axioms / C S* and a finite set of splicing rules R C (S*)** form a splicing system 
{I,R). Every splicing system (/, i?) generates a language L{I,R) = cr|j(/)- Note that L{I,R) is 
the smallest language which is closed under the splicing operator aji and includes /. It is known 
that the language generated by a splicing system is regular, see [6,17]. A (regular) language L is 
called a splicing language if a splicing system (/, R) exists such that L — L{I, R). 

A rule r is said to respect a language L if ar{L) C L. It is easy to see that for any splicing 
system (/, i?), every rule r G R respects the generated language L{I,R). Moreover, a rule r ^ R 
respects L{I, R) if and only if L{I, R U {r}) — L{I, R). We say a splicing (wi, W2) z respects a 
language i if wi,u'2 G i and r respects i; obviously, this implies z G L, too. 

The main result of this section states that, if a regular language L is a splicing language, then 
it is generated by a particular splicing system (/, R) which only depends on the syntactic monoid 
of L. 



Theorem 4.1. Let L be a splicing language and m 
I = x;<™'+6™ n L and 



\Ml\. The splicing system (/, -R) with 



R 



^ Y^<m +10m 



T<2m 



T<2m 



^<m +10m 



r respects l| 



generates the language L — L(I,R). 



As the language generated by the splicing system (/, i?) is constructible. Theorem 4.1 implies 
that the problem whether or not a given regular language is a splicing language is decidable. A 
detailed discussion of the decidability result is given in Section 5. 

Let L be a formal language. Clearly, every set of words J C L and set of rules S where every 
rule in 5* respects L generates a subset L{J,S) C L. Therefore, in Theorem 4.1 the inclusion 
L{I, R) C L is obvious. The rest of this section is devoted to the proof of the converse inclusion 
L G L{I,R). The proof uses many ideas that have been employed in the Section 3. However, 
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there are some challenges we encounter solely while considering the classic splicing variant. The 
additional complexity comes from having to handle the first and fourth components of rules, which 
in the case of classical splicing occur both in the words used for splicing and the splicing result. In 
contrast, in Pixton splicing the sites of a rule do not occur in splicing result, whereas the bridge is 
not a factor of the words used for splicing. The structure of this section is the same as Section 3. 
In Section 4.1 we will present techniques to obtain rules that respect a regular language L from 
other rules that respect L, and we show how we can modify a splicing step, such that the words 
used for splicing are not significantly longer than the splicing result; similar results can be found 
in [8,9]. In Section 4.2 we use these techniques to show that a long word z € L can be obtained 
by a series of splicings from a set shorter words from L and by using rules which satisfy certain 
length restrictions. Finally, in Section 4.3 we prove Theorem 4.1. 



4.1 Rule Modifications 

The first lemma states us that we can extend the sites of a rule r such that the extended rule 
respects all languages that are respected by r. 

Lemma 4.2. Let r = (ui, Vi;u2, W2) be a rule which respects a language L. For every word x, the 
rules {xui,vi;u2,V2), (ui, Uix; U2, W2), {ui,vi',xu2,V2), and {ui,vi]U2,V2x) respect L as well. 

Proof. Let s be any of the rules (xui, wi; U2, W2), {ui,vix;u2,V2), {ui,vi; xu2,V2), {ui,vi;u2,V2x). 
In order to prove that s respects L, we have to show that, for all wi,W2 & L and z G E* such that 
{wi,W2) l^s we have z E L, too. Indeed, if (^1,^2) z, then {wi,W2) z and, as r respects 
L, we conclude z G L. □ 

Henceforth, for a rule r = (ui, ui; U2, W2), we will refer to the rules (a;iti, ui; U2, U2) and 
{ui,vix;u2,V2) as extensions of the left site of r and to (wi, wi; a;u2, W2) and {ui,vi;u2,V2x) as 
extensions of the right site of r. 

Next, for a language L, let us investigate the syntactic class of a rule r — wi; U2, W2)- The 
syntactic class (with respect to L) of r is the set of rules [r]i = x x [u2]l x ['V2]l and 

two rules r and s are syntactically congruent (with respect to L), denoted by r ~i s, if s G [r]i. 

Lemma 4.3. Let r be a rule which respects a language L. Every rule s £ [r]^ respects L. 

Proof. Let r ~ {ui, vi; U2, V2) and s = (ui, ui; 112, V2). Thus, Ui Ui and Vi ^ Vi for i = 1,2. For 
wi — xiuiviyi G L and W2 = X2U2'V2y2 G L we have to show that z =^ xiUiV2y2 G L. For z = 1, 2, 
let Wi = XiUiViyt and note that Wi Wi', hence, wt G L. Furthermore, {wi,W2) xiUiV2y2 — 
z G L as r respects L, and z G L as z z. □ 

Consider a splicing (xiUiViyi, X2U2V2y2) l~r xiUiV2y2 which respects a regular language L, as 
shown in Figure 9 on the left site. The factors fij/i and X2U2 may be relatively long but they 
do not occur as factors in the resulting word xiUiV2y2- In particular, it is possible that two long 
words are spliced and the outcome is a relatively short word. Using the Lemmas 4.2 and 4.3, we 
can find shorter words in L and a modified splicing rule which can be used to obtain XiUiV2y2- 



t5\ •■■V 

Xl Ul 

I — —\ — \ 1 

^2 2/2 

.•V' -^"^ 



Xl Ul 
I —\ -T^ 



V2 



2/2 



Figure 9: Replacing viyi and X2U2 by short words. 
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Lemma 4.4. Let r ~ wi; U2, W2) be a rule which respects a regular language L and wi — 
XiUiViyi € L, W2 = X2U2V2y2 S L. There is a rule s — (ui,vi;u2,V2) which respects L and words 
wi = xiUiVi £ L, W2 = U2V2y2 G L such that \vi \ , \u2\ < \Ml\. More precisely, Vi G and 
U2 e [x2U2]l- 

In particular, whenever {wi,W2) XiUiV2y2 = z, then there is a splicing {wi,W2) z which 
respects L where Wi, cind s have the properties described above. 

Proof. By Lemma 4.2, the rule (ui, wij/i; X2U2, ^2) respects L. Choose vi € [viyi]L and U2 G 
[x2U2]l as shortest words from the sets, respectively. By Lemma 2.2, |ui|,|u2| < \Ml\ and 
wi — xiuivi € L, W2 — U2V2y2 G L. Furthermore, by Lemma 4.3, s = (ui, -Di; U2, W2) respects 
L. □ 

4.2 Series of Splicings 

Consider the creation of words by a series of splicings. Let us begin with a simple observation. In 
the case when a word is created by two (or more) successive splicings, but none of the splicing sites 
overlaps the position of the other splicing, the order of these splicings is irrelevant. Recall that 
the splicing position of a splicing {wi, W2) \~r z with r = (ui, vi; U2, V2) is the position between the 
factors ui and V2 in z. The notation in Remark 4.5 is the same as in the Figure 10. 



i3\ .--V 

Xl Ul 



I —\ Z2 ^ 



Figure 10: The word xiUiZ2V3y3 can be created either by using the right splicing first or by using 
the left splicing first. 

Remark 4.5. Let wi = xiUiViyi, W2 = X2U2Z2v'2y2, where V2 is a prefix of Z2 and u'2 is a suffix 
of Z2, W3 = x^u^v^y^ be words and ri = (ui,vi]U2,V2), r2 = (1*2, 1^2; M3, "^3) be rules. In order to 
create the word z — xiUiZ2V3y3 by splicing, we may use splicings 

{wi,W2) \-ri XiUiZ2V2y2 = z' , (z'^ws) h^^ z Or 

(w2,W3) ^7-2 X2U2Z2V3y3 = z" , iwi,z") z. 

Now, consider a word z which is created by two successive splicings from words Wi ~ XiUiViyi 
for « = 1, 2, 3 as in Figure 11. If no factor of wi is a part of z, then we can find another splicing 
rule s such that (wa, W2) l~s z. This replacement will become crucial in the proof of Lemma 4.7. 

■^\..\ iJ-i.-X Ij'i.-X 

-0^ -V 'O'i .V 'O'i -V 

Xl Ul .■• X^ U3 .■■ X3 M3 

I I .- I - — \ 1 I I V I " I 1 I I v i ' I 1 

^ *^ \- *^ \- *^ 

Figure 11: If no part of x\U\V\y\ is a factor of the splicing result, then the two splicings can be 
reduced to one splicing. 

Lemma 4.6. Let L be a language, Wi = XiUiViyi G L for i — 1,2,3, and ri — wi; M2, W2), 
''2 = (u3, W3; W4, W4) be rules respecting L. If there are splicings 

(Wl, W2) XiUiV2y2 = = X4U4V4y4, {W3, W4) \-r2 x^u^v^yi = z 
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where v^y^ is a suffix of V2y2, then there is a rule s — (1/3, W3; M2(5, U4) which respects L and 
{w3,W2) z. Furthermore, V4 — W4 or <u V2- 

Proof. Extension (Lemma 4.2) justifies the assumption that the factors U1V2 and match in 
W4: let u;4[i,j] — U1V2 and W4[i',j'] — M4W4, 

• a i < i' we extend U4 in r2 to the left hy i' — i letters, 

• if i > i' we extend ui in ri to the left by i — i' letters, 

• if J < j' we extend V2 in ri to the right by j' — j letters, and 

• if J > j' we extend W4 in r2 to the right by j — j' letters. 

Clearly, the extended factors U1V2 and U4V4 match in w^. As v^y^ was a suffix of W22/2 before 
extension, now, W4 is a suffix of V2 and 2/2 = Vi- Additionally, either U4 was not extended or 
Vi <u V2 and V2 was not extended. Let 6 such that dv^ = t;2, let s = (U3, U3; M2(5, U4), and observe 
that (103,^2) z. 

Next, let us prove that s respects L. Let ^3 = x'^^u^v^y'^, 6 L and W2 = a;2U2(5v4j/2 = x'2U2V2y'2 G 
i. There are splicings 

(wi,^) l^n a;iUiW22/2 = 4 = XiU4V4,y'2, iw'3,Wi) h^^ x'^UzViy2 = z' 

and z' L, concluding that s respects L. □ 

Consider a splicing system (J, S*) and its generated language L = L{J, S). Let n be the length 
of the longest word in J and let /i be the length-lexicographically largest word that is a component 
of a rule in S. Define — {w € 'S* \ w <u iJ.} as the set of words which are at most as large as 
H, in length-lexicographic order. Furthermore, let / = S-" n L be a set of axioms and let 

R = {r £ I r respects L} 

be a set of rules. It is not difficult to see that J C /, 5 C i?, and L = L{I,R). Whenever 
convenient, we will assume that a splicing language L is generated by a splicing system which is 
of the form of (/, R). 

Now, let us consider a word xzy £ L where the length of the middle factor z is at least 
The creation of xzy by splicing in (/, i?) can be traced back to a word xizyi = zi where either 
zi G I or where zi is created by a splicing that affects the factor z, i. e., the splicing position lies 
in the factor z. The next lemma describes this creation of xzy — z^+i by k splicings in (/, R), and 
shows that we can choose the rules and words which are used to create Zk+i from zi such that the 
words are not significantly longer than £ = max{|a;| , |y|} and such that the rules satisfy certain 
length restrictions. 

Lemma 4.7. Let L be a splicing language, let £,n (zN, let m — \M[^\, and let fi be a word with 
ImI > ^ + 2m such that for I = fl L and i? = {r G | r respects L} we have L = L{I, R). 

Let Zk+i — Xk+izyk+i with \z\ > and \xk+i \ ,\yk+i\ < £ be a word that is created by k 
splicings from a word Z\ = X\zy\ where either z\ £ I or Z\ is created by a splicing (wo,w'q) h^. zi 
where wq,Wq G L, s respects L, and the splicing position lies in the factor z. Furthermore, for 
i — I, . . . , k the intermediate splicings are either 

(i) {wi,Zi) \-r. Xi^izyi^i = Zi^i, Wi £ L, ri G R, yi+i — yi, and the splicing position lies at the 
left of the factor z or 

(ii) {zi,Wi) hr- Xi^izyi^i — z^+i, Wi G L, ri G R, Xi^i = Xi, and the splicing position lies at the 
right of the factor z . 

There are rules and words creating Zk+i, as above, satisfying in addition: 

L There is k' < k such that for i — l,...,fc' all splicings are of the form (i) and for i = 
k' + I, . . . ,k all splicings are of the form (ii). 
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2. For i — 1, . . . , fc' the following hounds apply: \xi\ < (, -\- 2m, jwij < £ + 2m, Vi £ 5]<^+™ x 
5^<2m X I]<2™ X W^, and Xk'+i = Xk'+2 = ■■■= Xk+i- 

3. For i = k' + 1, . . . ,k the following bounds apply: \yi\ < £ + 2m, \wi\ < £ + 2m, ri G 
Wf, X S<2™ X E<2'" X E<^+", and yi = 2/2 = • • • = Vk'+i- 

In particular, if n > £ + 2m, then wi, . . . ,Wk G /. 

Proof. The first statement follows immediately by Remark 4.5 and the fact that \z\ > The 
first statement also implies implies Xk'+i = Xk'+2 = ■ ■ ■ = Xk+i and yi = y2 — ■ ■ ■ — Vk'+i- Note 
that if fc' = (or k' ^ k), then statement 2 (resp. statement 3) is trivially true. 




Figure 12: The z-th splicing for i < fc' in the proof of Lemma 4.7 where Xi+i = Uix[ and is a 
prefix of x'j^z. 

The notation we employ in order to prove statement 2 is chosen such that it matches with 
Figure 12. For i — 1, . . . , fc', let = [ui, Vi] v'^). By extension (Lemma 4.2), we may assume that 
Wi = UiVi and Xi — u[x[ such that x^+i = Uix'^ and v'^ is a prefix of x'^^z. Let x'i^,j^^ = Xk>+i = Xk+i 
and u'f.,^-^ = e. By Lemma 4.6, we may assume that every splicing position lies at the left of 
the previous splicing position, i.e., x'^ is a proper suffix of x\j^^ and \x'^ < £ as ja^fcz+ij < £■ Due 
to the modifications we made, we may have lost control of the lengths of Ui, Vi, and mJ; but v[ 
still belongs to and respects L. Let 5i+i such that x[j^-^ = 5i+ix[] hence, Ui — u[j^-^5i+i. 
The factor (5^+1 is the the part of xu+i which is added by the i-th splicing and is not modified 
afterwards; x^+i = 5k' +i ■ ■ ■ 52x'i. Now, for i = 2, . . . , fc', we replace u[ by a shortest word from 
[u'j\L- (We also replace this prefix of Xi and Ui-i.) Furthermore, we replace Vi by a shortest word 
from [vi\L for i = 1, . . . , fc'. By Lemma 2.2, we have ,\vi\ < m. We do not replace u'l yet, as 
this might affect the word wq and the rule s in the splicing {wq, w'q) hs xizyi. 

Observe that the words Zi, Wi, and the rules can still be used to create Zk+i by splicing, 
in the way described in the claim. For i = 2, . . . , fc', we have \xi\ = \u[xi\ < £ + m, \wi\ < 
\x,+i \ + ]vi\ < £+ 2m, and e S<^+™ x I]<™ x S<'" x W^. We also have \wi\ < £ + 2m and 
ri G Y,<i+"i X E<™ X S* X Wp. Note that, except for the length of xi, and the third component 
of ri , we have proven statement 2 (of the lemma) and we actually have proven a stronger bound 
than claimed. Symmetrically, we can consider statement 3 to be proven except for yi — yk'+i and 
the second component of rk'+i- 

Let xi = u[x[ as above and, symmetrically, let yi = y'l., ^iv'j^, where is the second 

component of r^'+i- If fc' = (or fc' = fc), then u'^ (resp. v'^^u^i) can be considered empty and 
x'l = Xk^i (resp. y'^ij^i — 2/fc+i)- If £ I, we replace u'l and v'^,,j^^ by shortest words from their 
syntactic classes, respectively, and the claim holds by Lemma 2.2. Otherwise, (wq, w'q) zi where 
u'l is a prefix of wq and v'j.,_^i is a suffix of w'q. 

Let s — {uq, vo;u'q,v'q) and consider the overlap of the factor uq in the splicing {wq, w'q) zi 
with the prefix u'l of wq . In case when uq does not overlap with u'l , replace u'^ by a shortest word 
from its syntactic class. If uq and u'l overlap, let u'l — S1S2 such that 62 is the overlap and replace 
Si and 62 by shortest words from their syntactic classes, respectively. Note that if we modified ui, 
it got shorter; hence, s still belongs to R. In any case, \u'i\ < 2m, \xi \ < £ + 2m (Lemma 2.2), and 
n e S<^+™ X I]<™ X S<2'" X Wf,; thus, the second statement. 

We may treat and rk'+i symmetrically in order to prove statement 3. □ 
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4.3 Proof of Theorem 4.1 



Let L be a splicing language and m — \Ml\. Throughout this section, by ~ we denote the 
equivalence relation ~i and by [•] we denote the corresponding equivalence classes 

Recall that Theorem 4.1 claims that the splicing system {I,R) with / = S^™ n L and 



r respects L . 



generates L. The proof is divided in two parts. In the first part, Lemma 4.8, we prove that 
the set of rules can be chosen as |r e +i0m^4 ^ respects l| for some finite set of axioms. 

The second part concludes the proof of Theorem 4.1, by employing the length bound 2m for the 
second and third component of rules and by proving that the set of axioms can be chosen as 

/ = x;<"'+6™ n L. 

Lemma 4.8. Let L and m as above. There exists n G N such that the splicing system (/, R) with 
I = E^" n L and 

i? = |r e (1]<'"'+I0m)4 ^ respects l} 
generates the same language L = L{I,R). 

Proof. As every word in / belongs to L and every rule in R respects L, the inclusion L(I, R) C L 
holds (for any n). 

Since L is a splicing language, there exists a splicing system (/', R') which generates L. Let 
n' be a number larger than any word in /' and larger than any component of a rule in R' and let 
n = n' + 6m. Let / = n L as in the claim and observe that L{I, R') = L. 

For a word fi we let = {w G S* | w <« /i}, as we did before. Define the set of rules where 
every component is length-lexicographically bounded by ^ 

i?p = {r e W"^ I r respects L} 

and the language — i?^); clearly, C L. For two words /i v we see that i?^ C i?^,, 
and hence, C _L,„. Thus, \i — L for some word /i, then for all words v with /i v, we have 
L„ = L. As L = i?'), there exists a word such that = L and + 6m < n. Let /i be the 
smallest word, in the length-lexicographic order, such that = L. Note that if \^\ < m? + 10, 
then Rfj^ C R and L = Lf^ C L{I,R). For the sake of contradiction assume > to^ + 10m. Let 
u be the next-smaller word than /it, in the length- lexicographic order, and let S — R^. Note that 
L{I , S) C. L and i?^ \ S contains only rules which have a component that is equal to fi. 

Choose w from L \ L{I,S) as a shortest word, i.e., for all w' € L with \w'\ < \w\, we have 
w' G L(I,S). Factorize w = xzy with \x\ = \y\ = 3m and note that \z\ > otherwise w G /. 
Factorize fi = 6iaf3'yS2 with , IJ2I > 5m, |a;;57| = m^, /3 ^ e, a ~ q;/3, and 7 ~ /37 (Lemma 2.1). 

We will show that there is a series of splicings which creates w from a set of shorter words and 
by using splicing rules from S. This yields a contradiction to the choice oi w. In order to find this 
series of splicings we investigate the creation of a word xzy where z is derived by using a pumping 
argument on all factors a/3j in z. 

Let j be a sufficiently large even number (j > 4 -I- |z| will suffice). We define a word z which 
is the result of applying the pumping algorithm from Lemma 2.3 on z, as discussed in Section 2.1. 
The pumping algorithm replaces the occurrences of a(3j in z by q;/3^7 such that for every factor 
z[k,k + m^] — a/37, either 

(a) aj3^/'^ is a factor of z starting at position z[k] or 

(b) /3^/^7 is a factor of z ending at position z[k + m^] 

holds. In particular, if 5ia(3'yS2 is a factor of z either (a) 7^2 is a prefix of a word in /3+ or (b) 
i5ia is a suffix of a word in . By induction and as a(3j ~ a/3^"f, it is easy to see that z z and 
xzy G L. 
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Let us trace back the creation of xzy G L by splicing in (/, i?^) to a word xizyi where either 
xizyi G / or where xizyi is created by a sphcing that affects z, i. e., the sphcing position hes within 
the factor z. Let Zk+i = Xk+izyk+i, where Xk+i = x and yk+i = y, be created by k sphcings from 
a word zi = xizyi where either xizyi G / or xizyi is created by a sphcing {wo,w'q) zi with 
wo,w'q G L, s G Rp,, and the sphcing position hes in the factor z. Furthermore, for i — 1, . . . , fc 
the intermediate sphcings are either 

(i) {wi,Zi) \~r, Xi+izyi+i = Zi+i, Wi € L, n e R^, j/i+i = y^, and the sphcing position hes at 
the left of the factor z or 

(ii) {zi,Wi) Xi+izyi+i = Zi+i, Wi E L, ri E R^, x^+i = Xi, and the splicing position lies at 
the right of the factor z. 

Note that \z\ > \z\ > and, therefore, we can apply Lemma 4.7 (with £ — 3to). Thus, we may 
assume that Wi & I and \xi \ ,\yi\ < 5m for i = 1, . . . , fc. 

Consider a rule in a splicing of the form (i). By Lemma 4.7, G x z;<2™ x 1]<2™ x W^. 

Suppose the fourth component of covers a prefix of the left-most factor in z which is longer 

than a (as j is very large, it cannot fully cover By extension (Lemma 4.2), we may write 

Ti = {ui,vi;u2,v'a[3'^) for some h > 1. By Lemma 4.3 and as a ~ a/3, we may replace this rule 
by (ui, vi;u2, v'a). Note that, as the fourth component got shorter, now G S. 

After we symmetrically treated rules of form (ii) , these new rules ri , . . . , and the words 
wi, . . . , Wfe can be used in order to create w = Xk+izyk+i from xizyi by splicing. In order to see 
this, observe that, even though the factors al3"/ in z, which we pumped up before, may overlap 
with each other, the left-most (and right-most) position where we replaced /3 by is preceded by 
the factor a (resp. succeeded by the factor 7) in z. 

Next, we show that all the rules ri, . . . , belong to S, now. By contradiction, suppose Vi ^ S 
for some i and, by symmetry, suppose this i-th splicing is of the form (i). Thus, the fourth 
component of has to be = Sia/3jd2- As \Si\ > 5m > \xi\, the factor a/37 in A* is covered by 
z. Let fc such that afij = z[k; k + m?] is this factor in z. The pumping algorithm ensured that 
(a) a/3^/^ is a factor of z starting at position z[k] or (b) /3^/^7 is a factor of z ending at position 
z\k + m^]. As j /2 is very large and the splicing position of (wi, Zi) h^; Zj+i is too close to the left 
end of Zi+i, case (b) is not possible. Thus, case (a) holds, the fourth component of ri overlaps 
in more than \a\ letters with the left-most factor in 2, and we used the replacement above 

which ensured G — contradiction. 

Let us summarize: if xizyi was in L{I,S), then w G L(I,S) as well, which would contradict 
the choice of w. If zi = xizyi G /, then xizyi, which is at most as long as zi, would belong to I 
and we are done. We only have to consider the case when (wq, w'q) hg zi = xizyi and the splicing 
position lies within the factor z. We will show that, from this splicing, we derive another splicing 
{wq^Wq) ht xizyi which respects L{I,S) and, therefore, yields the contradiction. 

Let s = {u,vi]U2,v), Wo = xuvi and w'q = U2vy where \vi \ , |u2| < m, by Lemma 4.4 (here, x 
and y are newly chosen words). We have 

zi — xizyi — xuvy 

where xu is a proper prefix of xiz and vy is a proper suffix of zyi. 

We will see next that if s ^ S, then we can use a rule s G 5 and maybe slightly modified words 
in order to obtain zi by splicing. If s ^ S", then u = /i or v = 11. Suppose u = ^ = Siaf3jS2- Thus, 
the factor of /i is covered by the factor z in zi as \Si \ > 5m > \xi\. Let a/37 = z[k; fc -I- m^] be 
this factor, (a) a/?^/^ is a factor of z starting at position z[k] or (b) 13^/"^ j is a factor of z ending 
at position z[k + m^]. If (b) holds, Sia is a suffix of a word in /3"'". We may write 5ia = /32/3^ 
where £ > and (32 is a suffix of /3. Replace u by f32j6i and use this new rule s in order to splice 
(100,^0) zi. Note that the first component is now shorter than /i. Otherwise, (a) holds and 
7(52 w is a prefix of a word in /3~''. As j is very large and 7 is a prefix of a word in /3+, we may 
extend v (Lemma 4.2) such that we can write /37(52 — /3^^/3i and v — /32/3^^7 where £1 > 1, £2 ^ 
and /3i/32 — (3. Now, we pump down one of the f3 in the first component and /3^^ in the fourth 



20 



component and we let s = {SiaP^^ wi; U2, /327) ~ s. As all components are shorter than /i, 
we see that s G S and 

i.e., we have shifted one of the occurrences of /? from wq to w'q. Note that /32j is a prefix of 
fiifi^^^^l- Treating the fourth component analogously justifies the assumption that s G 5. 

Next, we will pump down the factors a/?-'7 to a/37 in ^ again. At every position where we 
pumped up before, we are now pumping down (in reverse order) in order to obtain the words 
i, M, v, y from the words x, u, v, y, respectively. For each pumping step do: 

1. If w is covered by the factor a/3-' 7 (which we pump down in this step), extend u to the left 
such that it becomes a prefix of al3^j. Symmetrically, if v is covered by the factor a/3^j, extend 
V to the right such that it becomes a suffix of a/S^j (Lemma 4.2). Observe that extension ensures 
that the factor a/3^7 is covered by either xu, uv, or vy. 

2. If aP^ or /3^7 is covered by one of x, u, u, or y, then replace this factor by a/3 or /37, 
respectively. Otherwise, let us show how to pump when a/3-'7 is covered by xu. The cases when 
a/3^7 is covered by uv or vy can be treated analogously. We can factorize x = x'afi^^fii and 
u = l32P-'^ju' where /3i/32 = /3 and ji + j2 + 1 — j- The pumping results are the words x'af3i and 
P2IU' , respectively. 

Observe that, after reversing all pumping steps, xu ~ xu, vy ^ vy, xuvy = xizyi, and the rule 
t ~ {u,vi;u2,v) respects L. Furthermore, if we used extension for u (or v) in one of the steps, 
then Im] < (resp. |w| < m^); in any case t G S. Recall that w was chosen as the shortest word 
from L \ L{I, S). As \xuvi \ , \u2vy\ < \z\ + 6m = \w\, the words wq = xuvi and w'q = U2vy belong 
to L{I, S), and as (wq, Wq) xizyi, we conclude that Xizyi as well as w belong to L(/, S) — the 
desired contradiction. □ 

Now, we can prove our main result. 

Proof of Theorem 4-1- Recall that for a splicing language L with m — \Ml\ we intend to prove 
that the splicing system (/, R) with / = i]<™'+6™ n L and 

i? = {r e s<'"'+i°'" X E<2'" X S<2'" X s<™'+io™ ^ respects l| 

generates the language L — L{I, R). 

Obviously, L{I, R) C L. By Lemma 4.8, we may assume that L is generated by a splicing 
system (J, S) where 

5 = |r e (i]<™'+io™)4 ^ respects l} . 

In order to prove L C L{I, R), we use induction on the length of words in L. For w E L with 
\w\ < + 6m, by definition, w G I C L{I, R). 

Now, consider w G L with \w\ > m^ + 6m. The induction hypothesis states that every word 
w' G L with \w'\ < \w\ belongs to L{I,R). Factorize w — xaf3j5y such that \x\ — \y\ = 3m, 
\al3j\ — rr? , /3 ^ e, a ^ a/3, and 7 ^ /37 (Lemma 2.1). 

The proof idea is similar as in the proof of Lemma 4.8. We use a pumping argument on /3 in 
order to obtain a very long word. This word has to be created by a series of splicings in (J, S\ 
We show that these splicings can be modified in order to create w by splicing from a set of strictly 
shorter words and with rules from R. Then, the induction hypothesis yields w G L{I,R). 

Choose j sufficiently large {j > \w\ + m^ + 10m and J does not contain words of length j or 
more). We let z = a^^^S and investigate the creation of xzy e L by splicing in [J, S). As z is not 
a factor of a word in J, we can trace back the creation of xzy by splicing to the point where the 
factor z is affected for the last time. Let zu+i = Xk+izyk+i, where Xk+i = x and yk+i = y, be 
created by k splicings from a word zi = xizyi which is created by a splicing {wo,Wq) zi with 
Wo, Wq G L, s G S, and the splicing position lies in the factor z. Furthermore, for i — 1, . . . ,k the 
intermediate splicings are either 

(i) {wi,Zi) f-j.. Xi+izyi^i = Zi+i, Wi G L, ri G S, y^+i — yt, and the splicing position lies at the 
left of the factor z or 
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(ii) {zi,Wi) hr- Xi+izyi^i = Zi+i, Wi E L, ri E S, x^+i = Xi, and the splicing position lies at the 
right of the factor z. 

As \z\ > m? + IOto we can apply Lemma 4.7. Thus, we may assume wi, . . . , w/c G /, ri, . . . , E R, 
and jxi I , 1 < 5to. 

Consider a rule in a splicing of the form (i). Suppose the fourth component of covers a 
prefix of the factor a(3^ in z which is longer than a/3 (as j is very large, it cannot fully cover aP^). 
By extension (Lemma 4.2), we may write = {ui,vi;u2,v'a/3^) for some £ > 1. By Lemma 4.3 
and as a a/3, we may replace this rule by (ui, wi; U2, v'a) E R. Moreover, after we symmetrically 
treated rules of form (ii) , these new rules ri , . . . , and the words wi, . . . ,Wk can be used in order 
to create w = Xk+iaf3jSyk+i from xia/SjSyi by splicing. Thus, if xia^jSyi belongs to L{I,R), 
so does w. 

Now, consider the first splicing {wq,w'q) zi = xizyi. By Lemma 4.4, let s = {u,vi;u2,v) 
such that wq = xuvi, w'q = U2vy and |tii| , \u2\ < m (here, x and y are newly chosen words). 
Hence, 

zi = xuvy — Xizyi = xiafi-'jSyi 

where xu is a proper prefix of xiz and vy is a proper suffix of zyi. 

Next, we will pump down the factor afi^^f to af3j in z again in order to obtain the words 
i, u, V, y from the word x, u, v, y, respectively. The pumping is done as in the proof of Lemma 4.8: 

1. If u is covered by the factor a/3^'j, extend u to the left such that it becomes a prefix of a/3^j. 
Symmetrically, if v is covered by the factor a/3^"f, extend v to the right such that it becomes a 
suffix of a/S^j (Lemma 4.2). Observe that extension ensures that the factor afi^j is covered by 
either xu, uv, or vy. 

2. If Of/?-' or (3^ J is covered by one of x, u, v, or y, then replace this factor by a/3 or f3"f, 
respectively. Otherwise, let us show how to pump when a/3^"f is covered by xu. The cases when 
af3^"f is covered by uv or vy can be treated analogously. We can factorize x = x'af3^^f3i and 
u = /32/3^^7u' where /3i/32 = /3 and ji + J2 + 1 = j- The pumping result are the words x'af3i and 
P2IU' , respectively. 

Observe that, xu ~ xu, vy ~ vy, xuvy = xiaf3jSyi, and the rule t = {u,vi;u2,v) respects L. 
Furthermore, if we used extension for u (or v), then |u| < (resp. \v\ < m^). No matter whether 
we used extension, t E R. As , |m2'5?;| < \z\ + 6to = \w\ and by induction hypothesis, the 

words Wo — xuvi and wq — U2vy belong to L{I,S). We conclude that (wo,Wo) xiajS^Syi E 
L{I,R) and, therefore, w = xu+ioiP^Syk+i E L{I,R) as well. □ 

5 Decidability 

The main question we intended to answer when starting our investigation was, whether or not it is 
decidable if a given regular language L is a splicing language. If we can decide whether a splicing 
rule respects a regular language and if we can construct a (non-deterministic) finite automaton 
accepting the language generated by a given splicing system, then we can decide whether i is a 
classic splicing language (Pixton splicing language) as follows. We compute the splicing system 
(J, R) as given in Theorem 4.1 (resp. Theorem 3.1) and we compute a finite automaton accepting 
the splicing language L{I,R). Theorem 4.1 (resp. Theorem 3.1) implies that L is a splicing 
language if and only if L = L{I,R). Recall that equivalence of regular languages is decidable, 
e. g., by constructing and comparing the minimal deterministic finite automata of both languages. 

It is known from [8, 13] that it is decidable whether a classic splicing rule respects a regular 
language. Furthermore, there is an effective construction of a finite automaton which accepts 
the language generated by a Pixton splicing system [17]. As mentioned earlier, Pixton splicing 
systems are more general than classic splicing systems, which means the latter result applies to 
classic splicing systems, too. Such a construction for classic splicing systems is also given in [12]. 

Let us prove that it is decidable whether a Pixton splicing rule r respects a regular language 
L. Actually, we will decide whether the set [r]^ respects L, which is equivalent by Lemma 4.3. 
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The proof can easily be adapted in order to prove that it is decidable whether a classic splicing 
rule respects L. 

Lemma 5.1. Let L he a regular language and let r be a Pixton splicing rule. It is decidable 
whether r respects L. 

Proof. Let denote the equivalence relation and [ • ] denote the corresponding equivalence 
classes [ • ]l. 

Let r — {ui,U2; v). We define the two sets ^i, 5*2 C Ml as 

Si = {X E Ml I 3Y: X[ui]Y C L} , S2 = {Y e Ml \ 3X : X[u2]Y C L} , 

i.e., [xi] belongs to 5*1 if and only if xiUiyi G L for some word j/i and [1/2] belongs to S2 if and 
only if X2U2y2 G L for some word X2. We claim that r respects L if and only if C L for all 

X e 5*1 and Y Q S2, which is a property that can easily be decided. 

Firstly, suppose r respects L. For X £ Si and Y E S2 choose words xi £ X and y2 £ Y. By 
definition of Si and S2, there is yi and X2 such that XiUiyi G L for i = 1,2 and, as r respects L, 
Xivy2 G L. This implies C L. 

Vice verse, suppose C L for all X £ Si and y G For all XiUiyi G L with i — 1,2, 

we have [a;i] G Si and [^2] G >5'2. Therefore, Xivy2 G [a:i][w][?;2] ^ L and r respects L. □ 

These observations lead to the decidability results. 

Corollary 5.2. 

i.) For a given regular language L, it is decidable whether or not L is a classic splicing language. 
Moreover, if L is a classic splicing language, a splicing system (/, R) generating L can be 
effectively constructed. 

ii.) For a given regular language L, it is decidable whether or not L is a Pixton splicing language. 
Moreover, if L is a Pixton splicing language, a splicing system (J, R) generating L can be 
effectively constructed. 

Final Remarks 

It has been known since 1991 that the class <S of languages that can be generated by a splicing 
system is a proper subclass of the class of regular languages. However, to date, no other natural 
characterization for the class S exists. The problem of deciding whether a regular language 
is generated by a splicing system is a fundamental problem in this context and has remained 
unsolved. To the best of our knowledge, the problem was first stated in the literature in 1998 [11]. 
In this paper we solved this long standing open problem. 

Regarding the complexity of the decision algorithm, let L be a regular language given as syn- 
tactic monoid Ml and (/, R) be the splicing system described in Theorem 4.1 (resp. Theorem 3.1). 
An automaton which accepts L{I,R) and is created as described in Section 5 has a state set of 
size in 2'^'™ where m = \Ml\. Deciding the equivalence of two regular languages, given as 
NFAs, is known to be PSPACE-complete [20]; hence, the naive approach to decide whether or not 

L = L{I,R) uses double exponential time 2^°' As there may be an exponential gap between 
an NFA accepting L and the syntactic monoid Ml, the complexity, when considering an NFA as 
input, becomes triple exponential. Improving the complexity of the algorithm is subject of future 
research. 
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